-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gh 5121 fedx bind left join #5122
base: develop
Are you sure you want to change the base?
Conversation
Refactor the existing logic for executing bind joins into a reusable base class. This change mostly moves the implementation logic from the existing ControlledWorkerBindJoin class to a new intermediate implementation (with the goal to make it reusable in a second step for left joins). Note that the bind join implementation no longer uses the ControlledWorkerJoin as base class, i.e. the decision of which join implementation to use is moved to the strategy.
Prepare to execute a specific implementation of a left join implementation through the federation strategy.
This change provides the implementation and activation for the left bind join operator. The algorithm is as follows: - execute left bind join using regular bound join query - process result iteration similar to BoundJoinVALUESConversionIteration - remember seen set of bindings (using index) and add original bindings to those, i.e. put to result return all non-seen bindings directly from the input Note that the terminology in literature has changed to "bind joins". Hence, for new classes and methods I try to follow that. Change is covered with a unit tests
return configuredBindJoinSize; | ||
} | ||
|
||
protected interface TaskCreator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, moving this interface up to the base class seems to cause a class incompatbility:
Sep 06, 2024 12:24:10 PM japicmp.output.incompatible.IncompatibleErrorOutput warn
WARNING: Incompatibility detected: Requires semantic version level MAJOR: JApiMethod [oldMethod=org.eclipse.rdf4j.federated.evaluation.join.ControlledWorkerBoundJoin.determineTaskCreator(org.eclipse.rdf4j.query.algebra.TupleExpr,org.eclipse.rdf4j.query.BindingSet), newMethod=org.eclipse.rdf4j.federated.evaluation.join.ControlledWorkerBoundJoin.determineTaskCreator(org.eclipse.rdf4j.query.algebra.TupleExpr,org.eclipse.rdf4j.query.BindingSet), returnType=JApiReturnType [oldReturnTypeOptional=org.eclipse.rdf4j.federated.evaluation.join.ControlledWorkerBoundJoin$TaskCreator, newReturnTypeOptional=org.eclipse.rdf4j.federated.evaluation.join.ControlledWorkerBindJoinBase$TaskCreator, changeStatus=MODIFIED], getCompatibilityChanges()=[METHOD_RETURN_TYPE_CHANGED]]
Sep 06, 2024 12:24:10 PM japicmp.output.incompatible.IncompatibleErrorOutput warn
WARNING: Incompatibility detected: Requires semantic version level MAJOR: JApiMethod [oldMethod=org.eclipse.rdf4j.federated.evaluation.join.ControlledWorkerBoundJoin.setSubmitFirstResultImmediately(boolean), newMethod=n.a., returnType=JApiReturnType [oldReturnTypeOptional=void, newReturnTypeOptional=value absent, changeStatus=REMOVED], getCompatibilityChanges()=[METHOD_REMOVED]]
Sep 06, 2024 12:24:10 PM japicmp.output.incompatible.IncompatibleErrorOutput warn
WARNING: Incompatibility detected: Requires semantic version level MAJOR: JApiSuperclass [jApiClass=JApiClass [fullyQualifiedName=org.eclipse.rdf4j.federated.evaluation.join.ControlledWorkerBoundJoin, changeStatus=MODIFIED, compatibilityChanges=[METHOD_DEFAULT_ADDED_IN_IMPLEMENTED_INTERFACE]], oldSuperclass=org.eclipse.rdf4j.federated.evaluation.join.ControlledWorkerJoin, newSuperclass=org.eclipse.rdf4j.federated.evaluation.join.ControlledWorkerBindJoinBase, changeStatus=MODIFIED, compatibilityChanges=[SUPERCLASS_REMOVED]]
I will try to find a solution that remeains binary compatible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might create the interface of the same name in the parent class and extend this one from that? This should preserve compatibility.
/** | ||
* A {@link LookAheadIteration} for processing bind left join results. | ||
* | ||
* Algorithm: | ||
* | ||
* <ul> | ||
* <li>execute left bind join using regular bound join query</li> | ||
* <li>process result iteration similar to {@link BoundJoinVALUESConversionIteration}</li> | ||
* <li>remember seen set of bindings (using index) and add original bindings to those, i.e. put to result return all | ||
* non-seen bindings directly from the input</li> | ||
* | ||
* | ||
* @author Andreas Schwarte | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add that this is used for OPTIONAL clauses. Is it used for anything else?
protected BindingSet convert(BindingSet bIn, int bIndex) throws QueryEvaluationException { | ||
QueryBindingSet res = new QueryBindingSet(); | ||
Iterator<Binding> bIter = bIn.iterator(); | ||
while (bIter.hasNext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a for-each look?
for (Binding bs : bIn){
...
}
int bIndex = Integer.parseInt( | ||
bIn.getBinding(BoundJoinVALUESConversionIteration.INDEX_BINDING_NAME).getValue().stringValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the binding Value object ever a literal? So it would be possible to check if it's instanceof IntegerLiteral or something like that?
|
||
protected final ControlledWorkerScheduler<BindingSet> scheduler; | ||
|
||
protected final Phaser phaser = new Phaser(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen in other code that we call phaser.forceTermination()
in the handleClose
method. Would that be wise here too?
protected final CloseableIteration<BindingSet> iter; | ||
protected final List<BindingSet> bindings; | ||
|
||
protected Set<Integer> seenBindingIndexes = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the IntHashSet be appropriate here?
@hmottestad thanks for the feedback. I will get back to this PR soon (we are currently in the release finalization phase of our product and this requires most of my attention). Quick question: is there a timeline for the 5.1 release? I would like to see this change included (if I get it ready) such that we can make use of it in our next release. Ideally the 5.1 release would be available in the second half of november. What are the current project plans? |
We don't have any plans at the moment. I'm currently working on the last SHACL paths that aren't supported. I also have a PR for configuring the Apache HttpClient timeouts. And there is a request for adding support for configuring the document loader in the new JSON-LD 1.1 parser. Would really want the timeouts included, and if I start on the JSON-LD document loader then it shouldn't take more than a few weeks to get it merged. So we can say mid november for RDF4J 5.1. |
Note: The PR is still WIP (some further changes, and especially unit test coverage will follow) however it is ready for early review. The goal is to have this available in RDF4J 5.1
GitHub issue resolved: #5121
Briefly describe the changes proposed in this PR:
The PR consists of multiple commits which add an implementation for bind left joins. It is recommended to follow the change commit by commit (as the first ones are preparational refactoring)
Commit 1: GH-5121: refactor the bind join logic into a reusable base class
Refactor the existing logic for executing bind joins into a reusable
base class.
This change mostly moves the implementation logic from the existing
ControlledWorkerBindJoin class to a new intermediate implementation
(with the goal to make it reusable in a second step for left joins).
Note that the bind join implementation no longer uses the
ControlledWorkerJoin as base class, i.e. the decision of which join
implementation to use is moved to the strategy.
Commit 2: GH-5121: prepare execution of left joins in the federation strategy
Prepare to execute a specific implementation of a left join
implementation through the federation strategy.
Commit 3: GH-5121: implementation of left bind join operator
This change provides the implementation and activation for the left bind
join operator.
The algorithm is as follows:
to those, i.e. put to result return all non-seen bindings directly from
the input
Note that the terminology in literature has changed to "bind joins".
Hence, for new classes and methods I try to follow that.
Change is covered with a unit tests
TODOs
PR Author Checklist (see the contributor guidelines for more details):
mvn process-resources
to format from the command line)